FUNDUS - The urban geography of inequalities: Budapest summer edition

The FUNDUS project of Urbanum Lab (a tech lab empowered by the interdisciplinary Urbanum Research Foundation), as its very title suggests, aims at laying the foundations for a globally applicable, open and accessible approach (including technological as well as methodological contributions) for the assessment of how economical factors of living (especially, real estate prices) are correlated with the quality of life in a certain urban environment, as captured by environmental and other social indicators.

In this notebook, the Budapest summer edition, we will be focusing on a simple, yet central question with manifold interpretations and consequences: do higher (average) real estate prices indicate a greener environment, less prone to the heat island phenomenon? (In simpler words: does more expensive mean greener and cooler in the summer?)

Outline

In this notebook, we are using Budapest as an example, and part of our data comes from our own scraper solution to obtain average property prices from real estate offers on the internet.

In particular, the notebook is structured as follows:

  • We present technical details on the data being used
  • we summarize the technical takeaways provided,
  • ...

How to reproduce our results

This notebook presents the first, exploratory data analysis phase of our project. We would like to get a glimpse into the data and its possible use. First, you have to run the property price scraper. You can find more info on getting the property prices data in the repository of the project. We assume that you have a WEkEO account, if this is not the case, please register here. It is a good practice to install all the project requirements using a separated Python 3.9.10 (or higher) virtual environment. Use requirements.txt to install all dependencies. Last, you have to configure your .hdarc file using your WEkEO credentials. This article shows you how to do this.

If you would like to adapt our notebook to a different municipality or for a different time horizon, use WEkEO's online data exploratory tool.

You might find useful the BoundingBox tool to get latitude and longitude coordinates of a given area.

Data used

WEkEO datasets

Dataset queries were generated by using the WEkEO online platform. The queries can be fund in the data/jsons folder

  • Global 10-daily Leaf Area Index 333m
    {
    "datasetId": "EO:CLMS:DAT:CGLS_GLOBAL_LAI300_V1_333M",
    "dateRangeSelectValues": [
      {
        "name": "dtrange",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T23:59:59.999Z"
      }
    ]
    }
    
  • Level 2 Land - Sea and Land Surface Temperature Radiometer (SLSTR) - Sentinel-3
    {
    "datasetId": "EO:ESA:DAT:SENTINEL-3:SL_2_LST___",
    "boundingBoxValues": [
      {
        "name": "bbox",
        "bbox": [
          18.99804053609134,
          47.42120186691113,
          19.190237776905892,
          47.58048586099437
        ]
      }
    ],
    "dateRangeSelectValues": [
      {
        "name": "position",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T00:00:00.000Z"
      }
    ],
    "stringChoiceValues": [
      {
        "name": "productType",
        "value": "LST"
      },
      {
        "name": "timeliness",
        "value": "Near+Real+Time"
      },
      {
        "name": "orbitDirection",
        "value": "ascending"
      },
      {
        "name": "processingLevel",
        "value": "LEVEL2"
      }
    ]
    }
    
  • Global 10-daily Fraction of Vegetation Cover 333m
    {
    "datasetId": "EO:CLMS:DAT:CGLS_GLOBAL_FCOVER300_V1_333M",
    "dateRangeSelectValues": [
      {
        "name": "dtrange",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T23:59:59.999Z"
      }
    ]
    }
    

External dataset

  • The square meter prices dataset was collected on 12th July 2022 using our freely available scraper. The data is in the data/aggragated folder. WARNING: Check the README.md file of the scraper to get your own data.
  • The POIs data was downloaded from OpenStreetMap using the pyrosm package. For more details see this repo
  • Thanks to JárókelÅ‘ - a platform for street maintenance and for communication between citizens and local administrations - we got data on various issues reported by volunteers. For more on this, see this repos

Technical takeaways

The notebook provides a novel methodology for exploratory data analysis in the field of urban digital geography, instantiated using a limited, yet appropriate example (Budapest, Hungary in the summer). At the end of this notebook, you will know:

  • How to aggregate data using the h3 library,
  • how to visualize the data using the pydeck package, and
  • how to investigate spatial discrepancies along property prices and environmental factors

Data acquisition

Library imports

Here, we import packages for the project.

In [7]:
!pip install h3 altair pydeck xarray hda
Requirement already satisfied: h3 in /opt/conda/lib/python3.8/site-packages (3.7.4)
Requirement already satisfied: altair in /opt/conda/lib/python3.8/site-packages (4.2.0)
Requirement already satisfied: pydeck in /opt/conda/lib/python3.8/site-packages (0.7.1)
Requirement already satisfied: xarray in /opt/conda/lib/python3.8/site-packages (2022.6.0)
Collecting hda
  Downloading hda-0.2.2.tar.gz (12 kB)
Requirement already satisfied: jsonschema>=3.0 in /opt/conda/lib/python3.8/site-packages (from altair) (3.2.0)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.8/site-packages (from altair) (2.11.2)
Requirement already satisfied: pandas>=0.18 in /opt/conda/lib/python3.8/site-packages (from altair) (1.4.3)
Requirement already satisfied: toolz in /opt/conda/lib/python3.8/site-packages (from altair) (0.12.0)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.8/site-packages (from altair) (0.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from altair) (1.21.2)
Requirement already satisfied: ipykernel>=5.1.2; python_version >= "3.4" in /opt/conda/lib/python3.8/site-packages (from pydeck) (6.0.2)
Requirement already satisfied: ipywidgets>=7.0.0 in /opt/conda/lib/python3.8/site-packages (from pydeck) (7.6.4)
Requirement already satisfied: traitlets>=4.3.2 in /opt/conda/lib/python3.8/site-packages (from pydeck) (4.3.2)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from xarray) (20.4)
Requirement already satisfied: requests>=2.5.0 in /opt/conda/lib/python3.8/site-packages (from hda) (2.24.0)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from hda) (4.50.2)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (49.6.0.post20201009)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (20.2.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (0.17.3)
Requirement already satisfied: six>=1.11.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (1.15.0)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.8/site-packages (from jinja2->altair) (1.1.1)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.18->altair) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.18->altair) (2.8.1)
Requirement already satisfied: tornado<7.0,>=4.2 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (6.1)
Requirement already satisfied: jupyter-client<7.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (6.1.7)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (1.4.1)
Requirement already satisfied: ipython<8.0,>=7.23.1 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (7.25.0)
Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.1.2)
Requirement already satisfied: ipython-genutils~=0.2.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (0.2.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (3.5.1)
Requirement already satisfied: jupyterlab-widgets>=1.0.0; python_version >= "3.6" in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (1.0.1)
Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (5.0.8)
Requirement already satisfied: decorator in /opt/conda/lib/python3.8/site-packages (from traitlets>=4.3.2->pydeck) (4.4.2)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->xarray) (2.4.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (2021.5.30)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (1.25.11)
Requirement already satisfied: pyzmq>=13 in /opt/conda/lib/python3.8/site-packages (from jupyter-client<7.0->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (19.0.2)
Requirement already satisfied: jupyter-core>=4.6.0 in /opt/conda/lib/python3.8/site-packages (from jupyter-client<7.0->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (4.6.3)
Requirement already satisfied: backcall in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.2.0)
Requirement already satisfied: pickleshare in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (3.0.8)
Requirement already satisfied: pexpect>4.3; sys_platform != "win32" in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (4.8.0)
Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.17.2)
Requirement already satisfied: pygments in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (2.7.1)
Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (6.1.5)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.2.5)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.8/site-packages (from pexpect>4.3; sys_platform != "win32"->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.6.0)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/conda/lib/python3.8/site-packages (from jedi>=0.16->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.7.1)
Requirement already satisfied: Send2Trash in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.5.0)
Requirement already satisfied: prometheus-client in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.8.0)
Requirement already satisfied: terminado>=0.8.3 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.9.1)
Requirement already satisfied: argon2-cffi in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (20.1.0)
Requirement already satisfied: nbconvert in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (5.6.0)
Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.14.3)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.4.2)
Requirement already satisfied: defusedxml in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.6.0)
Requirement already satisfied: bleach in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (3.2.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.8.4)
Requirement already satisfied: testpath in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.4.4)
Requirement already satisfied: pycparser in /opt/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (2.20)
Requirement already satisfied: webencodings in /opt/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.5.1)
Building wheels for collected packages: hda
  Building wheel for hda (setup.py) ... done
  Created wheel for hda: filename=hda-0.2.2-py3-none-any.whl size=12598 sha256=1aaab2bb4083b96089d1ba1143fb2d571a4aaf6b0b185463172a285868c5d881
  Stored in directory: /tmp/pip-ephem-wheel-cache-3ijhop80/wheels/78/bb/4b/7ad7c8162841ecbef925db8090268bd8b9506d33617b4e7b34
Successfully built hda
Installing collected packages: hda
Successfully installed hda-0.2.2
In [9]:
import json
import os
from functools import reduce

import h3
import altair as at
import numpy as np
import pandas as pd
import pydeck as pdk
import xarray as xr
from hda import Client

Gathering the data

WARNING: Downloading the datasets takes time! The data will be downloaded into the current working directory. Use your favorite tools to move the downloaded files to the appropriate folders.

SLSTR

In [ ]:
c = Client(debug=True)
with open("../data/jsons/temperature.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

LAI

In [ ]:
c = Client(debug=True)
with open("../data/jsons/lai.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

FCOVER

In [ ]:
c = Client(debug=True)
with open("../data/jsons/fcover.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

Your operating system and tools might be different, below you can read tips which might be useful on a Linux machine

  • The notebook assumes that all data is in the data folder.
  • Move all zip and nc files to the folder (e.g. you can use the mv command, like mv ../data "*.zip")
  • Make folders for the data fiels (cd data; mkdir leaf_data temp_data fcover)
  • Move the *.nc files to the corresponding folders (e.g. move Leaf Area Index data into leaf_data)
  • Move the .zip files into the temp_data folder `cd temp_data; unzip ".zip"; rm "*.zip"`

Cleaning and transforming data

Property square meter prices

We scraped a Hungarian real estate listing site to get property prices in Budapest. The listing entries were geocoded using the geocoder package. The geo-coordinates were indexed using the H3 hexagonal geospatial indexing system. You can check the resolution table of the cell areas here. For more details, you can check the repository of the scraper.

The data looks like this:

In [10]:
df5 = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
df6 = pd.read_csv("../data/aggregated/l6.tsv", sep="\t")
df7 = pd.read_csv("../data/aggregated/l7.tsv", sep="\t")
df8 = pd.read_csv("../data/aggregated/l8.tsv", sep="\t")

df7.head()
Out[10]:
l7 price
0 871e020caffffff 1.219298e+06
1 871e02684ffffff 1.200000e+06
2 871e030a9ffffff 1.508301e+06
3 871e03134ffffff 9.494950e+05
4 871e03449ffffff 1.197017e+06

The hexagons listed in these files constitues our area of interest.

Temperature data

The code below aggregates the average temperature data on various levels of H3 hashing and writes the results to a tsv file.

In [5]:
h3_l5 = set(df5["l5"])
h3_l6 = set(df6["l6"])
h3_l7 = set(df7["l7"])
h3_l8 = set(df8["l8"])

root_folder = "../data/temp_data"
dirs = [
    os.path.join(root_folder, d)
    for d in os.listdir(root_folder)
    if os.path.isdir(os.path.join(root_folder, d))
]


def is_within_bounding_box(lat, long):
    if 47.392134 < lat < 47.601216 and 18.936234 < long < 19.250031:
        return True
    else:
        return False


latlong_temp = {}
for inpath in dirs:
    # geodetic_tx.nc -> latitude_tx, longitude_tx
    geodetic = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "geodetic_tx.nc"), engine="netcdf4"
    )
    lat = geodetic.data_vars["latitude_tx"].to_numpy().flatten()
    long = geodetic.data_vars["longitude_tx"].to_numpy().flatten()
    # met_tx.nc -> temperature_tx
    met_tx = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "met_tx.nc"), engine="netcdf4"
    )
    temp = met_tx.data_vars["temperature_tx"].to_numpy().flatten()
    # LST_ancillary_ds.nc -> NDVI (empyt :()
    lst = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "LST_ancillary_ds.nc"), engine="netcdf4"
    )
    ndvi = lst.data_vars["NDVI"].to_numpy().flatten()

    temp_data = zip(lat, long, temp)
    temp_data = (e for e in temp_data if is_within_bounding_box(e[0], e[1]))
    for e in temp_data:
        k = (e[0], e[1])
        if latlong_temp.get(k, False):
            latlong_temp[k] = (latlong_temp[k] + e[2]) / 2
        else:
            latlong_temp[k] = e[2]

with open("../data/temp_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tcelsius\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in latlong_temp.items():
        l5 = h3.geo_to_h3(k[0], k[1], 5)
        l6 = h3.geo_to_h3(k[0], k[1], 6)
        l7 = h3.geo_to_h3(k[0], k[1], 7)
        l8 = h3.geo_to_h3(k[0], k[1], 8)
        if l5 in h3_l5 and l6 in h3_l6 and l7 in h3_l7 and l8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v - 273.15)
                + "\t"
                + l5
                + "\t"
                + l6
                + "\t"
                + l7
                + "\t"
                + l8
                + "\n"
            )
            outfile.write(o)

Global 10-daily Leaf Area Index 333m

The code below computes the average LAI and assigns H3 hash codes to the values. The results will be saved into a tsv file.

In [6]:
root_folder = "../data/leaf_data"
fs = [
    os.path.join(root_folder, f)
    for f in os.listdir(root_folder)
    if os.path.isfile(os.path.join(root_folder, f))
]

ll2lai = {}

for f in fs:
    try:
        ds = xr.open_dataset(filename_or_obj=os.path.join(f), engine="netcdf4")
        lat = ds.data_vars["LAI"]["lat"].to_numpy()
        lat = [e for e in lat if 47.392134 < e < 47.601216]
        lon = ds.data_vars["LAI"]["lon"].to_numpy()
        lon = [e for e in lon if 18.936234 < e < 19.250031]
        time = ds.data_vars["LAI"]["time"].to_numpy()[0]
        for i in range(len(lat)):
            for j in range(len(lon)):
                one_point = ds["LAI"].sel(lat=lat[i], lon=lon[i])
                vals = one_point.values[0]
                if ll2lai.get((lat[i], lon[j]), False):
                    ll2lai[(lat[i], lon[j])] = (ll2lai[(lat[i], lon[j])] + vals) / 2.0
                else:
                    ll2lai[(lat[i], lon[j])] = vals
    except Exception as exc1:
        print(exc1)
        continue

with open("../data/lai_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tlai\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in ll2lai.items():
        h5 = h3.geo_to_h3(k[0], k[1], 5)
        h6 = h3.geo_to_h3(k[0], k[1], 6)
        h7 = h3.geo_to_h3(k[0], k[1], 7)
        h8 = h3.geo_to_h3(k[0], k[1], 8)
        if h5 in h3_l5 and h6 in h3_l6 and h7 in h3_l7 and h8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v)
                + "\t"
                + str(h5)
                + "\t"
                + str(h6)
                + "\t"
                + str(h7)
                + "\t"
                + str(h8)
                + "\n"
            )
            outfile.write(o)

Global 10-daily Fraction of Vegetation Cover 333m

The code below computes the average FCOVER and assigns H3 hash codes to the values. The results will be saved into a tsv file.

In [7]:
root_folder = "../data/fcover"
fs = [
    os.path.join(root_folder, f)
    for f in os.listdir(root_folder)
    if os.path.isfile(os.path.join(root_folder, f))
]


def is_within_bounding_box(lat, long):
    if 47.392134 < lat < 47.601216 and 18.936234 < long < 19.250031:
        return True
    else:
        return False


ll2fcover = {}

for f in fs:
    try:
        ds = xr.open_dataset(filename_or_obj=os.path.join(f), engine="netcdf4")
        lat = ds.data_vars["FCOVER"]["lat"].to_numpy()
        lat = [e for e in lat if 47.392134 < e < 47.601216]
        lon = ds.data_vars["FCOVER"]["lon"].to_numpy()
        lon = [e for e in lon if 18.936234 < e < 19.250031]
        time = ds.data_vars["FCOVER"]["time"].to_numpy()[0]
        for i in range(len(lat)):
            for j in range(len(lon)):
                one_point = ds["FCOVER"].sel(lat=lat[i], lon=lon[i])
                vals = one_point.values[0]
                if ll2fcover.get((lat[i], lon[j]), False):
                    ll2fcover[(lat[i], lon[j])] = (
                        ll2fcover[(lat[i], lon[j])] + vals
                    ) / 2.0
                else:
                    ll2fcover[(lat[i], lon[j])] = vals
    except Exception as exc1:
        print(exc1)
        continue

with open("../data/fcover_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tfcover\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in ll2fcover.items():
        h5 = h3.geo_to_h3(k[0], k[1], 5)
        h6 = h3.geo_to_h3(k[0], k[1], 6)
        h7 = h3.geo_to_h3(k[0], k[1], 7)
        h8 = h3.geo_to_h3(k[0], k[1], 8)
        if h5 in h3_l5 and h6 in h3_l6 and h7 in h3_l7 and h8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v)
                + "\t"
                + str(h5)
                + "\t"
                + str(h6)
                + "\t"
                + str(h7)
                + "\t"
                + str(h8)
                + "\n"
            )
            outfile.write(o)

Visualizing the data

Maps

Square meter prices

In [11]:
df_price = pd.read_csv("../data/aggregated/l7.tsv", sep="\t")
df_price["normalized"] = 255 - (df_price["price"] / np.sqrt(np.sum(df_price["price"] ** 2)) * 1000)
layer = pdk.Layer(
    "H3HexagonLayer",
    df_price,
    get_hexagon="l7",
    auto_highlight=True,
    # elevation_scale=10,
    pickable=True,
    # elevation_range=[min(df["price"]), max(df["price"])],
    extruded=True,
    coverage=0.8,
    opacity=0.01,
    get_fill_color="[255, normalized, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "square meter price: {price}"},
)
r.to_html("../vizs/maps/prices_h7.html")
Out[11]:

Temperature

In [12]:
df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
df.fillna(0, inplace=True)

df_temp = df.groupby("l7").mean()
df_temp.reset_index(inplace=True, level=["l7"])
df_temp["rescaled"] = [255 - ((e**3)/100) for e in df_temp["celsius"]]
layer = pdk.Layer(
    "H3HexagonLayer",
    df_temp,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.8,
    opacity=0.05,
    get_fill_color="[255, rescaled, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "temperature (celsius): {celsius}"},
)
r.to_html("../vizs/maps/temperature_h7.html")
Out[12]:

Leaf Area Index

In [13]:
df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
df.fillna(0, inplace=True)

df_lai = df.groupby("l7").mean()
df_lai.reset_index(inplace=True, level=["l7"])
layer = pdk.Layer(
    "H3HexagonLayer",
    df_lai,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.9,
    opacity=0.05,
    get_fill_color="[255, 255 - (lai * 100), 0]"
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "Leaf Area Index: {lai}"},
)
r.to_html("../vizs/maps/lai_h7.html")
Out[13]:

Fraction of Vegetation Cover

In [14]:
df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")
df.fillna(0.0, inplace=True)

df_fcover = df.groupby("l7").mean()
df_fcover.reset_index(inplace=True, level=["l7"])
df_fcover["normalized"] = (df_fcover["fcover"] / np.sqrt(np.sum(df_fcover["fcover"] ** 5))) ** -2
df_fcover["normalized"][df_fcover["normalized"] == np.inf] = 255
layer = pdk.Layer(
    "H3HexagonLayer",
    df_fcover,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.8,
    opacity=0.05,
    get_fill_color="[255, normalized, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "Fraction of Vegetation Cover: {fcover}"},
)
r.to_html("../vizs/maps/fcover_h7.html")
/tmp/ipykernel_160/3755050406.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fcover["normalized"][df_fcover["normalized"] == np.inf] = 255
Out[14]:

Analysis

We would like to test the common conception that wealthier neighborhoods are greener and also enjoy a lower average temperature during summer. Since we have no data on wealth at this granularity, we use property square meter prices as a proxy of wealth. This is a strong and yet not often assessed assumption. In particular, we test if

  • temperature and greenness,
  • temperature and square meter prices,
  • greenness and square meter prices

are connected.

We present our findings as interactive visualizations. We start with a realtively fine H3 resolution (7), giving us a fairly tight covergae of the geographical are under investigation. However, as we notice that the geographical resolution might not match the resolution of economical data, we also experiment with lower H3 resolutions.

H3 level 7

In [15]:
at.renderers.enable('default')

data_frames = [df_price, df_temp, df_lai, df_fcover]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['l7'],
                                            how='outer'), data_frames)
df_merged.dropna(inplace=True)
df_merged.drop(columns=["normalized_x", "lat_x", "long_x", "rescaled", "lat_y",
                "long_y", "lat", "long", "normalized_y"], inplace=True)
df_merged.head()
Out[15]:
l7 price celsius lai fcover
4 871e03449ffffff 1.197017e+06 19.883478 4.093056 0.736756
5 871e0344affffff 9.605164e+05 22.373126 3.321111 0.779938
6 871e0344bffffff 1.556710e+06 18.937453 3.557240 0.784969
7 871e03459ffffff 1.115068e+06 20.933828 3.778786 0.813342
9 871e0345dffffff 1.449016e+06 21.627415 3.191146 0.775750
In [16]:
cor_data = (df_merged.corr().stack().reset_index().rename(columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data['correlation_label'] = cor_data['correlation'].map('{:.2f}'.format)  # Round to 2 decimal
cor_data
Out[16]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.082898 0.08
2 price lai 0.130627 0.13
3 price fcover 0.085454 0.09
4 celsius price 0.082898 0.08
5 celsius celsius 1.000000 1.00
6 celsius lai -0.064254 -0.06
7 celsius fcover -0.050965 -0.05
8 lai price 0.130627 0.13
9 lai celsius -0.064254 -0.06
10 lai lai 1.000000 1.00
11 lai fcover 0.958244 0.96
12 fcover price 0.085454 0.09
13 fcover celsius -0.050965 -0.05
14 fcover lai 0.958244 0.96
15 fcover fcover 1.000000 1.00
In [17]:
base = at.Chart(cor_data).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[17]:

H3 level 6

Since our price data is collected on street name level, maybe we should use a lower resolution.

In [18]:
price_h3_df = pd.read_csv("../data/aggregated/l6.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l6").mean()
temp_h3_df.reset_index(inplace=True, level=["l6"])
lai_h3_df = lai_df.groupby("l6").mean()
lai_h3_df.reset_index(inplace=True, level=["l6"])
fcover_h3_df = fcover_df.groupby("l6").mean()
fcover_h3_df.reset_index(inplace=True, level=["l6"])

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l6'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3.head()
Out[18]:
l6 price celsius lai fcover
4 861e0344fffffff 1.149972e+06 21.141896 3.661387 0.775320
5 861e0345fffffff 1.071476e+06 21.493805 3.573483 0.794305
7 861e03607ffffff 8.155039e+05 20.747444 0.731667 0.253750
9 861e03617ffffff 7.243512e+05 20.387555 1.173775 0.331828
10 861e0361fffffff 8.734682e+05 22.131759 0.756120 0.290160
In [19]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[19]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius -0.056756 -0.06
2 price lai 0.205356 0.21
3 price fcover 0.187743 0.19
4 celsius price -0.056756 -0.06
5 celsius celsius 1.000000 1.00
6 celsius lai -0.218399 -0.22
7 celsius fcover -0.248033 -0.25
8 lai price 0.205356 0.21
9 lai celsius -0.218399 -0.22
10 lai lai 1.000000 1.00
11 lai fcover 0.967857 0.97
12 fcover price 0.187743 0.19
13 fcover celsius -0.248033 -0.25
14 fcover lai 0.967857 0.97
15 fcover fcover 1.000000 1.00
In [20]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[20]:

H3 Level 5

In [21]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3.head()
Out[21]:
l5 price celsius lai fcover
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863
In [22]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[22]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 celsius price 0.828449 0.83
5 celsius celsius 1.000000 1.00
6 celsius lai 0.115429 0.12
7 celsius fcover 0.033319 0.03
8 lai price 0.637665 0.64
9 lai celsius 0.115429 0.12
10 lai lai 1.000000 1.00
11 lai fcover 0.967581 0.97
12 fcover price 0.538340 0.54
13 fcover celsius 0.033319 0.03
14 fcover lai 0.967581 0.97
15 fcover fcover 1.000000 1.00
In [23]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[23]:

Integrating other data sources

OSM main features and Jarokelo

In [24]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

osm_pois = pd.read_csv("../data/osm/key_l5.tsv", sep="\t")
osm_pois = osm_pois.pivot_table(values="0", index=osm_pois.l5, columns="key", aggfunc="first")
osm_pois.reset_index(inplace=True, level=["l5"])
osm_pois.fillna(0, inplace=True)

jarokelo = pd.read_csv("../data/jarokelo/jarokelo_l5.tsv", sep="\t")
jarokelo = jarokelo.pivot_table(values="0", index=jarokelo.l5, columns="Category", aggfunc="first")
jarokelo.reset_index(inplace=True, level=["l5"])
jarokelo.fillna(0, inplace=True)

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df, osm_pois, jarokelo]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3.head()
Out[24]:
l5 price celsius lai fcover amenity building craft landuse office ... Járda Kerékpárút Kátyú Közművek Közvilágítás Parkok és zöldterületek Parkolás Szemét Sziget Fesztivál Tömegközlekedés
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122 1337.0 7.0 2.0 0.0 2.0 ... 5.0 0.0 10.0 4.0 1.0 14.0 11.0 6.0 0.0 2.0
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782 2511.0 6.0 4.0 0.0 3.0 ... 76.0 15.0 89.0 183.0 83.0 160.0 17.0 220.0 0.0 47.0
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584 906.0 2.0 1.0 0.0 0.0 ... 23.0 2.0 20.0 49.0 18.0 78.0 7.0 52.0 0.0 30.0
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485 3768.0 10.0 4.0 0.0 4.0 ... 98.0 27.0 129.0 167.0 119.0 339.0 64.0 259.0 1.0 58.0
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863 23740.0 25.0 38.0 1.0 29.0 ... 408.0 201.0 288.0 935.0 272.0 1158.0 218.0 1139.0 1.0 291.0

5 rows × 25 columns

In [25]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[25]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 price amenity 0.502131 0.50
... ... ... ... ...
571 Tömegközlekedés Parkok és zöldterületek 0.992869 0.99
572 Tömegközlekedés Parkolás 0.984545 0.98
573 Tömegközlekedés Szemét 0.996508 1.00
574 Tömegközlekedés Sziget Fesztivál 0.725058 0.73
575 Tömegközlekedés Tömegközlekedés 1.000000 1.00

576 rows × 4 columns

In [26]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[26]:

OSM main subfeatures and Jarokelo

In [27]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

osm_pois = pd.read_csv("../data/osm/value_l5.tsv", sep="\t")
osm_pois = osm_pois.pivot_table(values="0", index=osm_pois.l5, columns="value", aggfunc="first")
osm_pois.reset_index(inplace=True, level=["l5"])
osm_pois.fillna(0, inplace=True)

jarokelo = pd.read_csv("../data/jarokelo/jarokelo_l5.tsv", sep="\t")
jarokelo = jarokelo.pivot_table(values="0", index=jarokelo.l5, columns="Category", aggfunc="first")
jarokelo.reset_index(inplace=True, level=["l5"])
jarokelo.fillna(0, inplace=True)

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df, osm_pois, jarokelo]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3.head()
Out[27]:
l5 price celsius lai fcover air_filling animal_boarding animal_shelter animal_training arts_centre ... Járda Kerékpárút Kátyú Közművek Közvilágítás Parkok és zöldterületek Parkolás Szemét Sziget Fesztivál Tömegközlekedés
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122 0.0 1.0 1.0 1.0 0.0 ... 5.0 0.0 10.0 4.0 1.0 14.0 11.0 6.0 0.0 2.0
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782 0.0 5.0 1.0 2.0 0.0 ... 76.0 15.0 89.0 183.0 83.0 160.0 17.0 220.0 0.0 47.0
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584 0.0 0.0 0.0 1.0 1.0 ... 23.0 2.0 20.0 49.0 18.0 78.0 7.0 52.0 0.0 30.0
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485 0.0 1.0 0.0 3.0 0.0 ... 98.0 27.0 129.0 167.0 119.0 339.0 64.0 259.0 1.0 58.0
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863 2.0 5.0 1.0 2.0 27.0 ... 408.0 201.0 288.0 935.0 272.0 1158.0 218.0 1139.0 1.0 291.0

5 rows × 211 columns

In [28]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[28]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 price air_filling 0.486479 0.49
... ... ... ... ...
43676 Tömegközlekedés Parkok és zöldterületek 0.992869 0.99
43677 Tömegközlekedés Parkolás 0.984545 0.98
43678 Tömegközlekedés Szemét 0.996508 1.00
43679 Tömegközlekedés Sziget Fesztivál 0.725058 0.73
43680 Tömegközlekedés Tömegközlekedés 1.000000 1.00

43681 rows × 4 columns

In [29]:
from altair import pipe, limit_rows, to_values
t = lambda data: pipe(data, limit_rows(max_rows=50000), to_values)
at.data_transformers.register('custom', t)
at.data_transformers.enable('custom')

base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=10000,
    height=10000
)

cor_plot + text
/opt/conda/lib/python3.8/site-packages/altair/utils/data.py:226: AltairDeprecationWarning: alt.pipe() is deprecated, and will be removed in a future release. Use toolz.curried.pipe() instead.
  warnings.warn(
Out[29]:

Discussion and conclusion

Here, we checked if there is a connection between property prices and environmental factors. In particular, we wanted to see if having a more expensive property in Budapest, Hungary correlates with a greener environment and a cooler summer. We used WEkEO data to get information on the environment and we scraped property prices to supplement the data. Interestingly, using lower (but still not too coarse) geographical resolutions provided us with some notable (even if not very surprising) correlations between more expensive properties and the greenness of its environment (using both the LAI and the FCOVER measure for greenness. However, we could not verify any significant direct connection between higher prices and more acceptable temperature conditions! This might be a first step towards an important future investigation: if consequences of climate change can be effectively tackled on individual level by spending more money. (Maybe they cannot.)

Future directions

In the future, we would like to increase the time window to get more LAI and FCOVER data. We investigate the possibility to incorporate other datasets like OSM and get official statistics on house prices and/or income, widening the scope of the FUNDUS project, to finally turn it into a full-fledged, environmentally and economically conscious qualitiy-of-life assessment approach. As already mentioned, a particular and potentially very interesting research question arising from the present notebook: can money effectively battle the effects of climate crisis in our direct surroundings? In order to learn more, we shall primarily proceed by expanding the geographical scope of the present notebook, and maybe even considering further economical factors. Here, the scarcity of open data is still a major impediment, but we hope we can contribute to the improvement of the situation with our endeavor.